Goto

Collaborating Authors

 Kisumu County




Retrieval-Augmented Clinical Benchmarking for Contextual Model Testing in Kenyan Primary Care: A Methodology Paper

Mutisya, Fred, Gitau, Shikoh, Syovata, Christine, Oigara, Diana, Matende, Ibrahim, Aden, Muna, Ali, Munira, Nyotu, Ryan, Marion, Diana, Nyangena, Job, Ongoma, Nasubo, Mbae, Keith, Wamicha, Elizabeth, Mibuari, Eric, Nsengemana, Jean Philbert, Chidede, Talkmore

arXiv.org Artificial Intelligence

Large Language Models (LLMs) hold promise for improving healthcare access in low-resource settings, but their effectiveness in African primary care contexts remains under-explored. We present a rigorous methodology for creating a benchmark dataset and evaluation framework focused on Kenyan Level 2-3 (dispensary and health center) clinical care. Our approach leverages retrieval-augmented generation (RAG) to ground questions and answers in Kenya's national clinical guidelines, ensuring content aligns with local standard-of-care. The guidelines were digitised, chunked, and indexed for efficient semantic retrieval. Gemini Flash 2.0 Lite was then prompted with relevant guideline excerpts to generate realistic clinical questions, multiple - choice answers, and reasoning scenarios with source citations in English and Swahili. We engaged Kenyan physicians in a co - creation process to refine the dataset's relevance and fairness, and instituted a blinded expert validation pipeline to review for clinical accuracy, clarity, and cultural appropriateness. The resulting Alama Health QA dataset comprises thousands of regulator-aligned question-answer pairs spanning common outpatient conditions in English and Swahili. Beyond standard accuracy metrics, we propose innovative evaluation measures targeting clinical reasoning, safety, and adaptability (e.g. Initial results highlight significant performance gaps in state - of-the - art LLMs when confronted with localized scenarios, echoing recent findings that LLM accuracy on African medical questions lags behind performance on U.S. benchmarks. Our work demonstrates a pathway for dynamic, locally-grounded benchmarks that can evolve with guidelines, providing a crucial tool for safe and effective deployment of AI in African healthcare. Advances in large language models have spurred interest in their potential to augment medical services, especially in low-and middle -income countries facing clinician shortages(Bekbolatova et al., 2024). By handling routine queries or providing decision support, LLMs might help bridge gaps in healthcare access across Africa.


Unlocking Location Intelligence: A Survey from Deep Learning to The LLM Era

Hao, Xixuan, Jiang, Yutian, Zou, Xingchen, Liu, Jiabo, Yin, Yifang, Liang, Yuxuan

arXiv.org Artificial Intelligence

Location Intelligence (LI), the science of transforming location-centric geospatial data into actionable knowledge, has become a cornerstone of modern spatial decision-making. The rapid evolution of Geospatial Representation Learning is fundamentally reshaping LI development through two successive technological revolutions: the deep learning breakthrough and the emerging large language model (LLM) paradigm. While deep neural networks (DNNs) have demonstrated remarkable success in automated feature extraction from structured geospatial data (e.g., satellite imagery, GPS trajectories), the recent integration of LLMs introduces transformative capabilities for cross-modal geospatial reasoning and unstructured geo-textual data processing. This survey presents a comprehensive review of geospatial representation learning across both technological eras, organizing them into a structured taxonomy based on the complete pipeline comprising: (1) data perspective, (2) methodological perspective and (3) application perspective. We also highlight current advancements, discuss existing limitations, and propose potential future research directions in the LLM era. This work offers a thorough exploration of the field and providing a roadmap for further innovation in LI. The summary of the up-to-date paper list can be found in https://github.com/CityMind-Lab/Awesome-Location-Intelligence and will undergo continuous updates.


HealthBench: Evaluating Large Language Models Towards Improved Human Health

Arora, Rahul K., Wei, Jason, Hicks, Rebecca Soskin, Bowman, Preston, Quiñonero-Candela, Joaquin, Tsimpourlas, Foivos, Sharman, Michael, Shah, Meghan, Vallone, Andrea, Beutel, Alex, Heidecke, Johannes, Singhal, Karan

arXiv.org Artificial Intelligence

HealthBench consists of 5,000 multi-turn conversations between a model and an individual user or healthcare professional. Responses are evaluated using conversation-specific rubrics created by 262 physicians. Unlike previous multiple-choice or short-answer benchmarks, Health-Bench enables realistic, open-ended evaluation through 48,562 unique rubric criteria spanning several health contexts (e.g., emergencies, transforming clinical data, global health) and behavioral dimensions (e.g., accuracy, instruction following, communication). HealthBench performance over the last two years reflects steady initial progress (compare GPT-3.5 Turbo's 16% to GPT-4o's 32%) and more rapid recent improvements (o3 scores 60%). Smaller models have especially improved: GPT-4.1 nano outperforms GPT-4o and is 25 times cheaper. We additionally release two HealthBench variations: HealthBench Consensus, which includes 34 particularly important dimensions of model behavior validated via physician consensus, and HealthBench Hard, where the current top score is 32%. We hope that HealthBench grounds progress towards model development and applications that benefit human health.


RideKE: Leveraging Low-Resource, User-Generated Twitter Content for Sentiment and Emotion Detection in Kenyan Code-Switched Dataset

Etori, Naome A., Gini, Maria L.

arXiv.org Artificial Intelligence

Social media has become a crucial open-access platform for individuals to express opinions and share experiences. However, leveraging low-resource language data from Twitter is challenging due to scarce, poor-quality content and the major variations in language use, such as slang and code-switching. Identifying tweets in these languages can be difficult as Twitter primarily supports high-resource languages. We analyze Kenyan code-switched data and evaluate four state-of-the-art (SOTA) transformer-based pretrained models for sentiment and emotion classification, using supervised and semi-supervised methods. We detail the methodology behind data collection and annotation, and the challenges encountered during the data curation phase. Our results show that XLM-R outperforms other models; for sentiment analysis, XLM-R supervised model achieves the highest accuracy (69.2\%) and F1 score (66.1\%), XLM-R semi-supervised (67.2\% accuracy, 64.1\% F1 score). In emotion analysis, DistilBERT supervised leads in accuracy (59.8\%) and F1 score (31\%), mBERT semi-supervised (accuracy (59\% and F1 score 26.5\%). AfriBERTa models show the lowest accuracy and F1 scores. All models tend to predict neutral sentiment, with Afri-BERT showing the highest bias and unique sensitivity to empathy emotion. https://github.com/NEtori21/Ride_hailing


Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election

Mondini, Roberto, Kotonya, Neema, Logan, Robert L. IV, Olson, Elizabeth M, Lungati, Angela Oduor, Odongo, Daniel Duke, Ombasa, Tim, Lamba, Hemank, Cahill, Aoife, Tetreault, Joel R., Jaimes, Alejandro

arXiv.org Artificial Intelligence

Online reporting platforms have enabled citizens around the world to collectively share their opinions and report in real time on events impacting their local communities. Systematically organizing (e.g., categorizing by attributes) and geotagging large amounts of crowdsourced information is crucial to ensuring that accurate and meaningful insights can be drawn from this data and used by policy makers to bring about positive change. These tasks, however, typically require extensive manual annotation efforts. In this paper we present Uchaguzi-2022, a dataset of 14k categorized and geotagged citizen reports related to the 2022 Kenyan General Election containing mentions of election-related issues such as official misconduct, vote count irregularities, and acts of violence. We use this dataset to investigate whether language models can assist in scalably categorizing and geotagging reports, thus highlighting its potential application in the AI for Social Good space.


DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Gema, Aryo Pradipta, Jin, Chen, Abdulaal, Ahmed, Diethe, Tom, Teare, Philip, Alex, Beatrice, Minervini, Pasquale, Saseendran, Amrutha

arXiv.org Artificial Intelligence

Large Language Models (LLMs) often hallucinate, producing unfaithful or factually incorrect outputs by misrepresenting the provided context or incorrectly recalling internal knowledge. Recent studies have identified specific attention heads within the Transformer architecture, known as retrieval heads, responsible for extracting relevant contextual information. We hypothesise that masking these retrieval heads can induce hallucinations and that contrasting the outputs of the base LLM and the masked LLM can reduce hallucinations. To this end, we propose Decoding by Contrasting Retrieval Heads (DeCoRe), a novel training-free decoding strategy that amplifies information found in the context and model parameters. DeCoRe mitigates potentially hallucinated responses by dynamically contrasting the outputs of the base LLM and the masked LLM, using conditional entropy as a guide. Our extensive experiments confirm that DeCoRe significantly improves performance on tasks requiring high contextual faithfulness, such as summarisation (XSum by 18.6%), instruction following (MemoTrap by 10.9%), and open-book question answering (NQ-Open by 2.4% and NQ-Swap by 5.5%).


Kenyan Sign Language (KSL) Dataset: Using Artificial Intelligence (AI) in Bridging Communication Barrier among the Deaf Learners

Wanzare, Lilian, Okutoyi, Joel, Kang'ahi, Maurine, Ayere, Mildred

arXiv.org Artificial Intelligence

Kenyan Sign Language (KSL) is the primary language used by the deaf community in Kenya. It is the medium of instruction from Pre-primary 1 to university among deaf learners, facilitating their education and academic achievement. Kenyan Sign Language is used for social interaction, expression of needs, making requests and general communication among persons who are deaf in Kenya. However, there exists a language barrier between the deaf and the hearing people in Kenya. Thus, the innovation on AI4KSL is key in eliminating the communication barrier. Artificial intelligence for KSL is a two-year research project (2023-2024) that aims to create a digital open-access AI of spontaneous and elicited data from a representative sample of the Kenyan deaf community. The purpose of this study is to develop AI assistive technology dataset that translates English to KSL as a way of fostering inclusion and bridging language barriers among deaf learners in Kenya. Specific objectives are: Build KSL dataset for spoken English and video recorded Kenyan Sign Language and to build transcriptions of the KSL signs to a phonetic-level interface of the sign language. In this paper, the methodology for building the dataset is described. Data was collected from 48 teachers and tutors of the deaf learners and 400 learners who are Deaf. Participants engaged mainly in sign language elicitation tasks through reading and singing. Findings of the dataset consisted of about 14,000 English sentences with corresponding KSL Gloss derived from a pool of about 4000 words and about 20,000 signed KSL videos that are either signed words or sentences. The second level of data outcomes consisted of 10,000 split and segmented KSL videos. The third outcome of the dataset consists of 4,000 transcribed words into five articulatory parameters according to HamNoSys system.


Artificial Intelligence for Public Health Surveillance in Africa: Applications and Opportunities

Tshimula, Jean Marie, Kalengayi, Mitterrand, Makenga, Dieumerci, Lilonge, Dorcas, Asumani, Marius, Madiya, Déborah, Kalonji, Élie Nkuba, Kanda, Hugues, Galekwa, René Manassé, Kumbu, Josias, Mikese, Hardy, Tshimula, Grace, Muabila, Jean Tshibangu, Mayemba, Christian N., Nkashama, D'Jeff K., Kalala, Kalonji, Ataky, Steve, Basele, Tighana Wenge, Didier, Mbuyi Mukendi, Kasereka, Selain K., Dialufuma, Maximilien V., Kumwita, Godwill Ilunga Wa, Muyuku, Lionel, Kimpesa, Jean-Paul, Muteba, Dominique, Abedi, Aaron Aruna, Ntobo, Lambert Mukendi, Bundutidi, Gloria M., Mashinda, Désiré Kulimba, Mpinga, Emmanuel Kabengele, Kasoro, Nathanaël M.

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) is revolutionizing various fields, including public health surveillance. In Africa, where health systems frequently encounter challenges such as limited resources, inadequate infrastructure, failed health information systems and a shortage of skilled health professionals, AI offers a transformative opportunity. This paper investigates the applications of AI in public health surveillance across the continent, presenting successful case studies and examining the benefits, opportunities, and challenges of implementing AI technologies in African healthcare settings. Our paper highlights AI's potential to enhance disease monitoring and health outcomes, and support effective public health interventions. The findings presented in the paper demonstrate that AI can significantly improve the accuracy and timeliness of disease detection and prediction, optimize resource allocation, and facilitate targeted public health strategies. Additionally, our paper identified key barriers to the widespread adoption of AI in African public health systems and proposed actionable recommendations to overcome these challenges.